Current Virtual Reality (VR) environments lack the rich haptic signals that humans experience during real-life interactions, such as the sensation of texture during lateral movement on a surface. Adding realistic haptic textures to VR environments requires a model that generalizes to variations of a user's interaction and to the wide variety of existing textures in the world. Current methodologies for haptic texture rendering exist, but they usually develop one model per texture, resulting in low scalability. We present a deep learning-based action-conditional model for haptic texture rendering and evaluate its perceptual performance in rendering realistic texture vibrations through a multi part human user study. This model is unified over all materials and uses data from a vision-based tactile sensor (GelSight) to render the appropriate surface conditioned on the user's action in real time. For rendering texture, we use a high-bandwidth vibrotactile transducer attached to a 3D Systems Touch device. The result of our user study shows that our learning-based method creates high-frequency texture renderings with comparable or better quality than state-of-the-art methods without the need for learning a separate model per texture. Furthermore, we show that the method is capable of rendering previously unseen textures using a single GelSight image of their surface.
translated by 谷歌翻译
Solving real-world sequential manipulation tasks requires robots to have a repertoire of skills applicable to a wide range of circumstances. To acquire such skills using data-driven approaches, we need massive and diverse training data which is often labor-intensive and non-trivial to collect and curate. In this work, we introduce Active Task Randomization (ATR), an approach that learns visuomotor skills for sequential manipulation by automatically creating feasible and novel tasks in simulation. During training, our approach procedurally generates tasks using a graph-based task parameterization. To adaptively estimate the feasibility and novelty of sampled tasks, we develop a relational neural network that maps each task parameter into a compact embedding. We demonstrate that our approach can automatically create suitable tasks for efficiently training the skill policies to handle diverse scenarios with a variety of objects. We evaluate our method on simulated and real-world sequential manipulation tasks by composing the learned skills using a task planner. Compared to baseline methods, the skills learned using our approach consistently achieve better success rates.
translated by 谷歌翻译
Multi-object tracking is a cornerstone capability of any robotic system. Most approaches follow a tracking-by-detection paradigm. However, within this framework, detectors function in a low precision-high recall regime, ensuring a low number of false-negatives while producing a high rate of false-positives. This can negatively affect the tracking component by making data association and track lifecycle management more challenging. Additionally, false-negative detections due to difficult scenarios like occlusions can negatively affect tracking performance. Thus, we propose a method that learns shape and spatio-temporal affinities between consecutive frames to better distinguish between true-positive and false-positive detections and tracks, while compensating for false-negative detections. Our method provides a probabilistic matching of detections that leads to robust data association and track lifecycle management. We quantitatively evaluate our method through ablative experiments and on the nuScenes tracking benchmark where we achieve state-of-the-art results. Our method not only estimates accurate, high-quality tracks but also decreases the overall number of false-positive and false-negative tracks. Please see our project website for source code and demo videos:
translated by 谷歌翻译
When humans perform contact-rich manipulation tasks, customized tools are often necessary and play an important role in simplifying the task. For instance, in our daily life, we use various utensils for handling food, such as knives, forks and spoons. Similarly, customized tools for robots may enable them to more easily perform a variety of tasks. Here, we present an end-to-end framework to automatically learn tool morphology for contact-rich manipulation tasks by leveraging differentiable physics simulators. Previous work approached this problem by introducing manually constructed priors that required detailed specification of object 3D model, grasp pose and task description to facilitate the search or optimization. In our approach, we instead only need to define the objective with respect to the task performance and enable learning a robust morphology by randomizing the task variations. The optimization is made tractable by casting this as a continual learning problem. We demonstrate the effectiveness of our method for designing new tools in several scenarios such as winding ropes, flipping a box and pushing peas onto a scoop in simulation. We also validate that the shapes discovered by our method help real robots succeed in these scenarios.
translated by 谷歌翻译
多任务学习的最新研究揭示了解决单个神经网络中相关问题的好处。 3D对象检测和多对象跟踪(MOT)是两个严重的相互交织的问题,可以预测并关联整个时间的对象实例位置。但是,3D MOT中的大多数先前作品都将检测器视为先前的分离管道,不一致地将检测器的输出作为跟踪器的输入。在这项工作中,我们提出了Minkowski Tracker,这是一种稀疏的时空R-CNN,可以共同解决对象检测和跟踪。受基于区域的CNN(R-CNN)的启发,我们建议将跟踪作为对象检测器R-CNN的第二阶段,该跟踪预测了轨道的分配概率。首先,Minkowski Tracker将4D点云作为输入,以生成时空鸟的视图(BEV)特征通过4D稀疏卷积编码器网络。然后,我们提出的TrackAlign聚集了BEV功能的轨道区域(ROI)功能。最后,Minkowski Tracker根据ROI功能预测的检测到追踪匹配概率更新了跟踪及其置信得分。我们在大规模实验中显示,我们方法的总体性能增益是由于四个因素:1。4D编码器的时间推理提高了检测性能2.对象检测的多任务学习和MOT共同增强了彼此3.检测到轨道比赛得分学习隐式运动模型以增强轨道分配4.检测到轨道匹配分数提高了轨道置信度得分的质量。结果,Minkowski Tracker在没有手工设计的运动模型的情况下实现了Nuscenes数据集跟踪任务上的最新性能。
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
translated by 谷歌翻译
神经辐射场(NERF)最近被成为自然,复杂3D场景的代表的强大范例。 NERFS表示神经网络中的连续体积密度和RGB值,并通过射线跟踪从看不见的相机观点生成照片逼真图像。我们提出了一种算法,用于通过仅使用用于本地化的板载RGB相机表示为NERF的3D环境导航机器人。我们假设现场的NERF已经预先训练了离线,机器人的目标是通过NERF中的未占用空间导航到目标姿势。我们介绍了一种轨迹优化算法,其避免了基于NERF中的高密度区域的碰撞,其基于差分平整度的离散时间版本,其可用于约束机器人的完整姿势和控制输入。我们还介绍了基于优化的过滤方法,以估计单位的RGB相机中的NERF中机器人的6dof姿势和速度。我们将轨迹策划器与在线重新循环中的姿势过滤器相结合,以提供基于视觉的机器人导航管道。我们使用丛林健身房环境,教堂内部和巨石阵线导航的四轮车机器人,使用RGB相机展示仿真结果。我们还展示了通过教会导航的全向地面机器人,要求它重新定位以缩小差距。这项工作的视频可以在找到。
translated by 谷歌翻译
translated by 谷歌翻译